🧠 Memory Hierarchy - abnv · Scour

Memory Allocations and the Hairy Underbelly of Systems Programming. 📊Memory Profilers

medium.com

·5d·

TurboQuant tackles the hidden memory problem that's been limiting your local LLMs 🧠Memory Models

xda-developers.com·2d·

Stack vs malloc: real-world benchmark shows 2–6x difference 📚Stack Data Structures

blog.stackademic.com

·18h·DEV·

Embracing AI with Claude's C Compiler 📅Instruction Scheduling

chipsandcheese.com·23h·Hacker News·

Branch prediction : an insight into CPU performance 🔮Branch Predictors

medium.com·1d·

OSTEP Chapter 13: The Abstraction of Address Spaces 📊Register Machines

muratbuffalo.blogspot.com·6d·Blogger·

Pure C implementation of the TurboQuant paper (ICLR 2026) for KV cache compression in LLM inference. 🗺️Region Inference

github.com·21h·r/LocalLLaMA·

Scaling the DRAM wall: How Lenovo is supporting startups during global memory crunch 📊Memory Profilers

yourstory.com·2d·

Dissecting Nvidia Blackwell - Tensor Cores, PTX Instructions, SASS, Floorsweep, Yield 🔀SIMD Programming

newsletter.semianalysis.com

·1d·

An Enticing Optimization For Linux Memory Reclaim On Today's Multi-Core Platforms 🧠Memory Consistency

phoronix.com·6d·r/linux·

Accelerate CPU-based AI inference workloads using Intel AMX on Amazon EC2 🗺️Region Inference

aws.amazon.com·2d·

The Pipeline Problem 🔀SIMD Programming

modular.com·3d·Hacker News·

Nvidia and its partners' KV Cache extenders ⚡Cache-Aware Algorithms

blocksandfiles.com·2d·

Bad Benchmarks and a Fourier-Analytic Framework for Characterizing the (Un)Hideability of Combinational-Logic Circuits 💾Cache-Oblivious Algorithms

eprint.iacr.org·2d·

From 300KB to 69KB per Token: How LLM Architectures Solve the KV Cache Problem 🧠Memory Models

news.future-shock.ai·4d·Hacker News·

The Sparsity Nexus: Bypassing O(N²) Attention with Judy Arrays ⚡Tokenizer Optimization

axwise.de·6d·Hacker News·

Memory Wall Gets Higher ⚡Cache-Aware Algorithms

semiengineering.com·6d·

TurboQuant: KV Cache Quantization to 3.5 Bits with Zero Accuracy Loss- ICLR 2026 📦Compact Data

darshanfofadiya.com·4d·Hacker News·

Peterc3-dev/rag-race-router: R.A.G-Race-Router [Adaptive Tri-Processor Inference Runtime] — Self-optimizing CPU+iGPU+NPU inference for AMD Ryzen AI 300 series 📊Profiling Tools

github.com·2d·Hacker News·

Taming the JVM Latency Monster 📊Memory Profilers

dzone.com·6d·

Loading more...